barRoso
  • Overview
  • Get Started
  • Articles
  • News
  • Reference
  • Authors
  • Help
    • Report a Bug
    • Ask a Question
    • FAQ

jabotR hex sticker JBRJ logo


barRoso is is an R package for standardizing and reconciling plant specimen data from herbarium collections and biodiversity repositories. It was originally developed for cleaning records from GBIF but is fully compatible with other sources, including SEINet, JABOT, REFLORA, and speciesLink. The name is both a tribute to Brazilian botanist Graziela Maciel Barroso and an acronym for Biodiversity Analysis and Record Reconciliation for Organizing Specimen Observations.

The package focuses on harmonizing collector names, standardizing taxonomic and geographic fields, and identifying duplicates across distributed herbarium records. It supports Darwin Core standards, integrates seamlessly with tidyverse workflows, and enables reproducible biodiversity data pipelines.

Unlike many packages that rely on static dictionaries to clean collector names, barRoso uses a robust set of regular expressions (regex) to dynamically identify and standardize collector patterns across datasets. This approach allows barRoso to generalize better across sources and spelling variations, even when names are inconsistently formatted. Combined with harmonization of geographic and taxonomic fields, this makes the package especially powerful for detecting duplicates and preparing data for downstream analysis.

While many data tools prioritize aggressive cleaning, often at the cost of discarding valuable records, barRoso takes a different approach. Its philosophy centers on standardization rather than removal. All herbarium specimens carry potential scientific value, even when incomplete or inconsistently entered. Instead of omitting such records, barRoso focuses on harmonizing fields to enhance comparability across collections. By standardizing collector names, geographic fields, and taxonomic labels, barRoso allows users to flag rather than erase inconsistencies, enabling more transparent workflows and tracing potential misidentifications, especially across distributed duplicates. This inclusive approach honors the archival role of herbaria while facilitating reproducible biodiversity research.


Get Started

Welcome new users! Start learning how to install barRoso

Articles

“How-to” articles to help you learn how to use barRoso

:::

Source Code
---
format:
  html:
    toc: true
    page-layout: custom
resources:
  - figures/
---

<style>
  .floating-logos {
    position: fixed;
    top: 60px;
    right: 450px;
    width: 165px;
    opacity: 0.85;
    z-index: 1;
  }

  .floating-logos img {
    display: block;
    width: 100%;
    margin-bottom: 12px;
    border-radius: 4px;
  }
</style>

<div class="floating-logos">
  <img src='/figures/barroso_hex_sticker.png' alt='jabotR hex sticker'>
  <img src='/figures/jbrj_marca.jpg' alt='JBRJ logo'>
</div>

::: whitebox
::: {style="padding-left: 100px; padding-right: 50px; display: inline-block;"}

::: {layout-ncol="2"}

::: {style="text-align: left;"}

\
`barRoso` is is an R package for standardizing and reconciling plant specimen data from herbarium collections and biodiversity repositories. It was originally developed for cleaning records from [GBIF](https://www.gbif.org) but is fully compatible with other sources, including [SEINet](https://swbiodiversity.org/seinet), [JABOT](https://jabot.jbrj.gov.br/v3/consulta.php), [REFLORA](https://floradobrasil.jbrj.gov.br/reflora), and [speciesLink](https://specieslink.net). The name is both a tribute to Brazilian botanist [Graziela Maciel Barroso](https://www.gov.br/jbrj/pt-br/assuntos/colecoes/arquivistica/graziela-maciel-barroso) and an acronym for **Biodiversity Analysis and Record Reconciliation for Organizing Specimen Observations**.\
\
The package focuses on harmonizing collector names, standardizing taxonomic and geographic fields, and identifying duplicates across distributed herbarium records. It supports Darwin Core standards, integrates seamlessly with tidyverse workflows, and enables reproducible biodiversity data pipelines.\
\
Unlike many packages that rely on static dictionaries to clean collector names, `barRoso` uses a robust set of regular expressions (regex) to dynamically identify and standardize collector patterns across datasets. This approach allows `barRoso` to generalize better across sources and spelling variations, even when names are inconsistently formatted. Combined with harmonization of geographic and taxonomic fields, this makes the package especially powerful for detecting duplicates and preparing data for downstream analysis.\
\
While many data tools prioritize aggressive cleaning, often at the cost of discarding valuable records, `barRoso` takes a different approach. Its philosophy centers on standardization rather than removal. All herbarium specimens carry potential scientific value, even when incomplete or inconsistently entered. Instead of omitting such records, `barRoso` focuses on harmonizing fields to enhance comparability across collections. By standardizing collector names, geographic fields, and taxonomic labels, `barRoso` allows users to flag rather than erase inconsistencies, enabling more transparent workflows and tracing potential misidentifications, especially across distributed duplicates. This inclusive approach honors the archival role of herbaria while facilitating reproducible biodiversity research.\
\
\
:::
::: {style="display: flex; gap: 16px; justify-content: center; align-items: center;"}
:::
:::
:::
:::

::: mainbox
::: {style="padding-left: 100px; padding-right: 100px; display: inline-block;"}
::: {layout-ncol="2"}
::: {style="text-align: center;"}
### [Get Started](/get-started/index.qmd)

Welcome new users! Start learning how to install `barRoso`
:::

::: {style="text-align: center;"}
### [Articles](/articles/index.qmd)

"How-to" articles to help you learn how to use `barRoso`
:::
:::
:::
:::

:::
 
  • About

  • FAQ

  • License

  • Edit this page
  • Report an issue